Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: refactor compaction splitting to reduce key comparisons #2259

Merged
merged 1 commit into from
Feb 8, 2023

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Jan 20, 2023

Introduce a new type frontiers, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
frontiers uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the limitFuncSplitter type to make use of frontiers.
The limitFuncSplitter type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by #2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the frontiers type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots #1810).

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jbowens jbowens force-pushed the frontiers branch 3 times, most recently from 7c15acc to 6ac10ed Compare January 22, 2023 18:10
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion


compaction_iter.go line 781 at r1 (raw file):

	i.key.Trailer = i.iterKey.Trailer
	i.keyTrailer = i.iterKey.Trailer
	i.frontiers.advance(i.key.UserKey)

I think ideally these frontiers would actually be incorporated into the merging iterator heap, to further reduce key comparisons and to detect that the frontier was reached before the compaction iterator logic executes. I'm not sure how to integrate it cleanly though, so I opted for this initial baby step which is good enough for avoiding introducing extra per-key comparisons in implementing the boundary alignment heuristic (#2156).

@jbowens jbowens force-pushed the frontiers branch 3 times, most recently from 269d555 to b881c50 Compare January 22, 2023 18:26
@jbowens jbowens requested review from a team and RaduBerinde January 23, 2023 00:00
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(drive-by)

Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @jbowens and @RaduBerinde)


compaction.go line 191 at r2 (raw file):

		// split point.
		if startKey := lf.c.rangeDelFrag.Start(); startKey != nil {
			lf.limit = lf.limitFunc(startKey)

happy to see this code move into DB.compact1.


compaction_iter.go line 914 at r2 (raw file):

// surpassed, the frontier's reached method is invoked with the key that reached
// the frontier. During the execution of reached, a frontier implementation may
// update the value of its `key`. If the `key` method returns nil, the frontier

clarification on "may update":
it's not going to update the parameter passed in reached(key []byte) and this update just means that future calls to key() may return a different key than before, yes?

Also is there a constraint that the reached-key passed in reached is >= the last returned frontier-key?


compaction_iter.go line 931 at r2 (raw file):

// compaction is about to surpass a key may add a frontier, with a `reached`
// function that will be invoked when the key is about to be reached or
// surpassed.

I guess the "about to be reached" clarifies my previous question. But I think it is worth being very explicit in the specification comment.

@bananabrick bananabrick self-requested a review January 24, 2023 20:29
@jbowens jbowens force-pushed the frontiers branch 2 times, most recently from d5c15fc to 2d28811 Compare January 31, 2023 22:41
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)


compaction.go line 191 at r2 (raw file):

Previously, sumeerbhola wrote…

happy to see this code move into DB.compact1.

🙌


compaction_iter.go line 914 at r2 (raw file):

Previously, sumeerbhola wrote…

clarification on "may update":
it's not going to update the parameter passed in reached(key []byte) and this update just means that future calls to key() may return a different key than before, yes?

Also is there a constraint that the reached-key passed in reached is >= the last returned frontier-key?

I made a few comment updates to make this all a little more explicit.


compaction_iter.go line 931 at r2 (raw file):

Previously, sumeerbhola wrote…

I guess the "about to be reached" clarifies my previous question. But I think it is worth being very explicit in the specification comment.

I think is this done now, but lemme know if it's still unclear.

@jbowens jbowens force-pushed the frontiers branch 2 times, most recently from c9a4ccf to c9cf2eb Compare February 2, 2023 14:49
@jbowens jbowens requested a review from a team February 2, 2023 17:32
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 9 unresolved discussions (waiting on @bananabrick, @jbowens, and @sumeerbhola)


compaction_iter.go line 926 at r4 (raw file):

// value.
//
// A frontier's `key` must be stable between calls to `reached`. If a frontier

Requiring this of the implementor is fragile.. And we must warn against using advance() (or really any other part of the API) between the key() changing and update() being called because the heap is internally broken at that time. In general, I'd avoid having a data structure where we expect the internal invariants to be broken at times.

A more robust way would be to pass the first key of interest in push() and update() and have reached() return the next key of interest. Or change the API so we always remove the frontier when the key is reached and the caller has to push() it again. I doubt that a Fix instead of a Pop and a Push is that much faster. The API can become as simple as push(key []byte, reached func())


compaction_iter.go line 1001 at r4 (raw file):

// adds the frontier if not already contained with the heap, and fixes up its
// position if it already is.
func (f *frontiers) update(ff frontier) {

[nit] It would be nice to differentiate between functions that are part of the "interface" and internal-only functions. The former can be uppercased even if the type is not exported. We can also consider moving this to a separate package


compaction_iter.go line 1003 at r4 (raw file):

func (f *frontiers) update(ff frontier) {
	hasKey := ff.key() != nil
	for i := 0; i < len(f.items); i++ {

If necessary, we can make this logarithmic by maintaining a map[frontier]int (map from frontier to index, updated in Swap). (maybe leave it as a TODO).

If linear time is acceptable for update and push, we might as well maintain a sorted array instead of a heap. Which could actually reduce the number of key comparisons (we don't need any to "pop", we only binary search when we insert).


compaction_iter.go line 1022 at r4 (raw file):

// push adds the provided frontier to the set of frontiers. If the provided
// frontier is already in the heap, it will be added again and will receive

If we allow a frontier to be added multiple times, we have to explain what happens when we call update after that.


compaction_iter.go line 1043 at r4 (raw file):

}

// fix, up and down are copied from the go stdlib.

Why aren't we using Go's stdlib? If the performance difference of calling Less and Swap through an interface is important, we could get that back by storing the kay and avoiding the two interface calls to .key() on every comparison..

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 8 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)


compaction_iter.go line 926 at r4 (raw file):

Previously, RaduBerinde wrote…

Requiring this of the implementor is fragile.. And we must warn against using advance() (or really any other part of the API) between the key() changing and update() being called because the heap is internally broken at that time. In general, I'd avoid having a data structure where we expect the internal invariants to be broken at times.

A more robust way would be to pass the first key of interest in push() and update() and have reached() return the next key of interest. Or change the API so we always remove the frontier when the key is reached and the caller has to push() it again. I doubt that a Fix instead of a Pop and a Push is that much faster. The API can become as simple as push(key []byte, reached func())

Heh, I actually started with exactly the push(key []byte, reached func()) interface, but I needed the ability to update an existing frontiers key (eg, the update operation). With just push(key []byte, reached func()) there's no way to identify which element you want to update during update (there's no guarantee that key()s are unique). I've updated the implementation to use the pointer of the frontier struct for this purpose.


compaction_iter.go line 1001 at r4 (raw file):

Previously, RaduBerinde wrote…

[nit] It would be nice to differentiate between functions that are part of the "interface" and internal-only functions. The former can be uppercased even if the type is not exported. We can also consider moving this to a separate package

I don't think there's enough here to warrant a separate package, but I've uppercased the entry points to made it clear.


compaction_iter.go line 1003 at r4 (raw file):

Previously, RaduBerinde wrote…

If necessary, we can make this logarithmic by maintaining a map[frontier]int (map from frontier to index, updated in Swap). (maybe leave it as a TODO).

If linear time is acceptable for update and push, we might as well maintain a sorted array instead of a heap. Which could actually reduce the number of key comparisons (we don't need any to "pop", we only binary search when we insert).

I suspect using a map would be a regression, given the small n. The linear part of this runtime only performs pointer equality checks, which are much cheaper than key comparisons or memory swaps.


compaction_iter.go line 1043 at r4 (raw file):

Previously, RaduBerinde wrote…

Why aren't we using Go's stdlib? If the performance difference of calling Less and Swap through an interface is important, we could get that back by storing the kay and avoiding the two interface calls to .key() on every comparison..

Agreed, we should remove the interface indirection of the frontier. Made that change.

We've seen the interface indirection have a significant enough impact on perf, that we maintain copied concrete heap types for the merging iterator, level checker and keyspan merging iterator.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 7 unresolved discussions (waiting on @bananabrick, @jbowens, and @sumeerbhola)


compaction_iter.go line 926 at r4 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Heh, I actually started with exactly the push(key []byte, reached func()) interface, but I needed the ability to update an existing frontiers key (eg, the update operation). With just push(key []byte, reached func()) there's no way to identify which element you want to update during update (there's no guarantee that key()s are unique). I've updated the implementation to use the pointer of the frontier struct for this purpose.

This is much cleaner, thanks!


compaction_iter.go line 927 at r5 (raw file):

// invocation, through its Update method.
type frontier struct {
	parent *frontiers

[nit] parent is a bit confusing (my first thought was parent inside the heap), maybe container or just frontiers


compaction_iter.go line 940 at r5 (raw file):

	// After reached is invoked, the frontier's key is updated to the return
	// value of `reached`. Note bene, the frontier is permitted to update its
	// key to a user key ≤ the argument `key`:

[nit] remove : or reformat the next paragraph as a subblock


compaction_iter.go line 942 at r5 (raw file):

	// key to a user key ≤ the argument `key`:
	//
	// If a frontier is set to key k1, and reached(k2) is invoked (k2≥k1),

[nit] space around >= for consistency

@jbowens jbowens force-pushed the frontiers branch 2 times, most recently from 7e66c39 to a008564 Compare February 3, 2023 14:53
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 5 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)


compaction_iter.go line 927 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] parent is a bit confusing (my first thought was parent inside the heap), maybe container or just frontiers

went with container


compaction_iter.go line 940 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] remove : or reformat the next paragraph as a subblock

Done.


compaction_iter.go line 942 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] space around >= for consistency

Done.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r1, 2 of 4 files at r5, 1 of 1 files at r6, 1 of 1 files at r7.
Reviewable status: all files reviewed (commit messages unreviewed), 11 unresolved discussions (waiting on @bananabrick, @jbowens, and @RaduBerinde)


compaction.go line 176 at r7 (raw file):

		// point must be ahead of the previous flush split point.
		limit := lf.limitFunc(key)
		lf.frontier.Update(limit)

why is this calling Update -- doesn't that duplicate the work of fixing up the heap that Advance will do?


compaction.go line 179 at r7 (raw file):

		return limit
	}
	lf.frontier.Update(nil)

same question


compaction.go line 223 at r7 (raw file):

// limit key.) If a wrapped splitter advises a split, it must continue
// to advise a split until a new output.
type userKeyChangeSplitter struct {

why are userKeyChangeSplitter and splitterGroup not using frontiers? Is that for the next PR?


compaction_iter.go line 1034 at r7 (raw file):

		// This frontier has been reached. Invoke the closure and update with
		// the next frontier.
		f.items[0].key = f.items[0].reached(k)

I am confused about what prevents this from looping forever if the situation mentioned in the earlier comment happens:

	// If a frontier is set to key k1, and reached(k2) is invoked (k2 ≥ k1),
	// reached is permitted to return `k1`. However, it will receive reached(k2)
	// calls until it returns nil or a key `k3` such that k2 < k3. This property
	// is useful for frontiers that use `reached` invocations to drive iteration
	// through InternalKeys which may contain the same user key multiple times.

I thought that the goal was for Advance to return and be called again when the compaction steps to the next InternalKey.


testdata/frontiers line 4 at r7 (raw file):

b e j
a p n z

why this empty line?


compaction_iter_test.go line 282 at r7 (raw file):

	var keySets [][][]byte
	datadriven.RunTest(t, "testdata/frontiers", func(t *testing.T, td *datadriven.TestData) string {
		switch td.Cmd {

Can you add a comment about what "init" is specifying, so that a casual reader does not need to read initTestFrontier.

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 5 files reviewed, 11 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)


compaction.go line 176 at r7 (raw file):

Previously, sumeerbhola wrote…

why is this calling Update -- doesn't that duplicate the work of fixing up the heap that Advance will do?

The max-grandparent-overlap splitter needs to update its frontier key when a new output is reached, even if it didn't request splitting to a new output. This is outside the context of a reached call. Advance will only need to fix up this frontier's position if this splitter individually did request the split, in which case we'll need to pop the heap root during reached and then re-insert using Update during onNewOutput.

I suppose we could reduce the amount of work when this splitter requests a split by optimistically calculating the next overlap key in reached assuming our split request will be immediately respected. The userKeyChangeSplitter complicates all this because a splitNow request may be ignored if not at a new user key. I think we should take a pass through cleaning up the compactionOutputSplitter interface once we're able to remove support for split user keys altogether.


compaction.go line 223 at r7 (raw file):

Previously, sumeerbhola wrote…

why are userKeyChangeSplitter and splitterGroup not using frontiers? Is that for the next PR?

The user key change splitter would need to materialize an immediate successor key to use as a frontier. This would beed to be on every key. Currently, we only perform a key comparison when the wrapped splitter requests a split. Also, separately, the user key change splitter is obsolete but we can’t remove it until we drop support for the format major versions that do allow split user keys. I’d like to do that and remove it, but we also need documentation so that non-Cockroach users understand what they need to do to migrate to master.

The splitterGroup doesn’t perform any key comparisons per-compaction key itself. The only time it needs to do any is in onNewOutput. (An aside, I think this interface can be simplified especially once we remove userKeyChangeSplitter. It's wonky that onNewOutput returns the splitterSuggestion for the new file.)


compaction_iter.go line 1034 at r7 (raw file):

Previously, sumeerbhola wrote…

I am confused about what prevents this from looping forever if the situation mentioned in the earlier comment happens:

	// If a frontier is set to key k1, and reached(k2) is invoked (k2 ≥ k1),
	// reached is permitted to return `k1`. However, it will receive reached(k2)
	// calls until it returns nil or a key `k3` such that k2 < k3. This property
	// is useful for frontiers that use `reached` invocations to drive iteration
	// through InternalKeys which may contain the same user key multiple times.

I thought that the goal was for Advance to return and be called again when the compaction steps to the next InternalKey.

The code setting the frontier may not be a pure function of the input key. In particular, this is intended for the grandparent boundary heuristic. As long as we still need to support split user keys, there may be multiple grandparents with smallest keys with the same user key. The reached invocations will drive iteration through the grandparents. While the may return the same user key multiple times, it’s limited to the number of times the user key is a smallest key of a grandparent file.


testdata/frontiers line 4 at r7 (raw file):

Previously, sumeerbhola wrote…

why this empty line?

it configures an empty frontier (ensuring an Init(_, nil, _) does not add the frontier to the heap). Added a comment up above.


compaction_iter_test.go line 282 at r7 (raw file):

Previously, sumeerbhola wrote…

Can you add a comment about what "init" is specifying, so that a casual reader does not need to read initTestFrontier.

Done.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r8, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @bananabrick, @jbowens, and @RaduBerinde)


compaction.go line 176 at r7 (raw file):

I think we should take a pass through cleaning up the compactionOutputSplitter interface once we're able to remove support for split user keys altogether.

Why would we remove userKeyChangeSplitter? Won't we always need it to convert file size splitting into splitting at user key boundaries?

I missed some subtleties in what is happening here, in that frontiers.Advance happens inside the compactionIter when it steps and before we ask the compactionOutputSplitter about splitting. Of course it can't be after, for correctness reasons.

  • Will there ever be a frontier.reached that does not return nil? If not, why complicate the interface?

splitter requests a split by optimistically calculating the next overlap key in reached assuming our split request will be immediately respected

I now see the difficulty here.

This is a rough thought:

  • We have 2 kinds of splitters: based on (a) key frontier, (b) some other characteristic
  • (a) only wants to be queried about shouldSplitBefore when the key frontier is reached. (b) wants to be queried for each key.
  • both (a) and (b) want to be told about a split via onNewOutput.

Instead of rolling all of these needs into compactionOutputSplitter, you have split the frontier part into a separate interface where the decision for (a) in shouldSplitBefore is moved into reached, and the decision is just returned in shouldSplitBefore. This is quite clean -- I can't think of any obvious improvement.

A code comment along those lines would help a reader understand the why of this dance between the compactionOutputSplitter and frontier.


compaction.go line 223 at r7 (raw file):

The user key change splitter would need to materialize an immediate successor key to use as a frontier. This would beed to be on every key. Currently, we only perform a key comparison when the wrapped splitter requests a split.

This could use a code comment too. If we did this, I suppose it would function something like:

  • nil frontier for user key change splitter for most of the keys
  • occasional shouldSplitBefore's will call frontier.Update with the immediate successor of the current user key that will be likely be reached very soon
  • The number of comparisons we will save is small since this key will be reached soon (and likely none since it will be the top of the heap).

compaction_iter.go line 1034 at r7 (raw file):

As long as we still need to support split user keys, there may be multiple grandparents with smallest keys with the same user key. The reached invocations will drive iteration through the grandparents. While the may return the same user key multiple times, it’s limited to the number of times the user key is a smallest key of a grandparent file.

Thanks for the explanation. Can you add a code comment stating why this feature is being supported, so we know when we could in theory remove it.

I suppose the new grandparent output splitter will only set a frontier key via Update once a certain output file size is reached, and then needs to keep changing it every time we move past the current grandparent file. I see now why it is desirable to return a key in reached, though it still seems like a shorthand to call Update within the reached implementation, except for the part that we don't need to iterate over the elements of the heap to find the relevant entry.
Since we are writing our own heap logic, we could add a frontier.heapIndex and eliminate this search. I don't have a strong opinion, so your choice.

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)


compaction.go line 176 at r7 (raw file):

Why would we remove userKeyChangeSplitter? Won't we always need it to convert file size splitting into splitting at user key boundaries?

Sorry, my brain broke. You're right, we can't remove userKeyChangeSplitter.

Will there ever be a frontier.reached that does not return nil? If not, why complicate the interface?

Yeah, the file size splitter for implementing #2156 will know about grandparents, and iteration through the grandparents will be driven by the reached calls. Every reached call will return the next grandparent's smallest key. But the same file size splitter may need to split at a key before the next reached call if the current output is getting too big and no grandparent boundaries have been encountered.

I think we should revisit the compactionOutputSplitter in that PR, when we can see all the new usages laid out in front of us and see what we can tease out.


compaction.go line 223 at r7 (raw file):

Previously, sumeerbhola wrote…

The user key change splitter would need to materialize an immediate successor key to use as a frontier. This would beed to be on every key. Currently, we only perform a key comparison when the wrapped splitter requests a split.

This could use a code comment too. If we did this, I suppose it would function something like:

  • nil frontier for user key change splitter for most of the keys
  • occasional shouldSplitBefore's will call frontier.Update with the immediate successor of the current user key that will be likely be reached very soon
  • The number of comparisons we will save is small since this key will be reached soon (and likely none since it will be the top of the heap).

added a comment


compaction_iter.go line 1034 at r7 (raw file):

Previously, sumeerbhola wrote…

As long as we still need to support split user keys, there may be multiple grandparents with smallest keys with the same user key. The reached invocations will drive iteration through the grandparents. While the may return the same user key multiple times, it’s limited to the number of times the user key is a smallest key of a grandparent file.

Thanks for the explanation. Can you add a code comment stating why this feature is being supported, so we know when we could in theory remove it.

I suppose the new grandparent output splitter will only set a frontier key via Update once a certain output file size is reached, and then needs to keep changing it every time we move past the current grandparent file. I see now why it is desirable to return a key in reached, though it still seems like a shorthand to call Update within the reached implementation, except for the part that we don't need to iterate over the elements of the heap to find the relevant entry.
Since we are writing our own heap logic, we could add a frontier.heapIndex and eliminate this search. I don't have a strong opinion, so your choice.

I adjusted the code comment on the reached field to try to make this a little clearer. I removed the language about "reached is permitted to return k1" since it seems to confuse more than anything. I think the interface of receiving reached(k2) calls until reached returns a key k3 such that k2 < k3 is clean and will still be necessary even when we have a guarantee that there are no split user keys (because there may be multiple grandparents smallest keys < k3).

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 3 of 3 files at r9, all commit messages.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @bananabrick, @jbowens, and @RaduBerinde)


compaction.go line 246 at r9 (raw file):

	//
	// We could implement this splitter using frontiers: When the inner splitter
	// requests a split beofre key `k`, we'd update a frontier to be

nit: before

Introduce a new type `frontiers`, designed to monitor several different user
key frontiers during a compaction. When a user key is encountered that equals
or exceeds the configured frontier, the code that specified the frontier is
notified and given an opportunity to set a new frontier. Internally,
`frontiers` uses a heap (code largely copied from the merging iterator's heap)
to avoid N key comparisons for every key.

This commit refactors the `limitFuncSplitter` type to make use of `frontiers`.
The `limitFuncSplitter` type is used to split flushes to L0 flush split keys,
and to split both flushes and compactions to avoid excessive overlap with
grandparent files.

This change is motivated by cockroachdb#2156, which will introduce an additional
compaction-output splitter that must perform key comparisons against the next
key to decide when to split a compaction. Additionally, the `frontiers` type
may also be useful for other uses, such as applying key-space-dependent logic
during a compaction (eg, compaction-time GC, disaggregated storage locality
policies, or keyspan-restricted snapshots cockroachdb#1810).
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: 4 of 5 files reviewed, 12 unresolved discussions (waiting on @bananabrick, @RaduBerinde, and @sumeerbhola)

@jbowens jbowens added this pull request to the merge queue Feb 8, 2023
Merged via the queue into cockroachdb:master with commit 2f086b7 Feb 8, 2023
@jbowens jbowens deleted the frontiers branch February 8, 2023 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants